ThermoPCD: a database of molecular dynamics trajectories of antibody–antigen complexes at physiologic and fever-range temperatures

Abstract Progression of various cancers and autoimmune diseases is associated with changes in systemic or local tissue temperatures, which may impact current therapies. The role of fever and acute inflammation-range temperatures on the stability and activity of antibodies relevant for cancers and autoimmunity is unknown. To produce molecular dynamics (MD) trajectories of immune complexes at relevant temperatures, we used the Research Collaboratory for Structural Bioinformatics (RCSB) database to identify 50 antibody:antigen complexes of interest, in addition to single antibodies and antigens, and deployed Groningen Machine for Chemical Simulations (GROMACS) to prepare and run the structures at different temperatures for 100–500 ns, in single or multiple random seeds. MD trajectories are freely available. Processed data include Protein Data Bank outputs for all files obtained every 50 ns, and free binding energy calculations for some of the immune complexes. Protocols for using the data are also available. Individual datasets contain unique DOIs. We created a web interface, ThermoPCD, as a platform to explore the data. The outputs of ThermoPCD allow the users to relate thermally-dependent changes in epitopes:paratopes interfaces to their free binding energies, or against own experimentally derived binding affinities. ThermoPCD is a free to use database of immune complexes’ trajectories at different temperatures that does not require registration and allows for all the data to be available for download. Database URL: https://sites.google.com/view/thermopcd/home


Introduction
Fever is an evolutionarily conserved innate response that contributes to survival during infections or other pathological conditions and that is generally restricted to a range of 38-40 ∘ C.After excluding the presence of infectious agents, fever of unknown origin, shown to be secondary to underlying cancers, is a clinically relevant concern (1).Different cancer models and associated treatments induce changes in the core body temperature of the patients (i.e.37 ∘ C).The subsequent fever response stems from three causes.First, fever is a common side effect to immune checkpoint inhibitor (ICI) therapy (2).Second, neoplastic fever may be the culprit, as with most haematologic cancers (3).Last, neutropenic fever is a common reaction to chemotherapy (4).Notably, the tumour inflammatory milieu is itself warmer by up to 2 ∘ C than the surrounding healthy tissues.For instance, inner tumour temperatures are 1-2 ∘ C above the physiological core body temperature in lung, bladder, breast, skin and brain cancers (5).
Fever is thus an unavoidable aspect of cancer progression and of the immune responses to cancers, and its role on ICI therapy has yielded contrasting views.For instance, during tumour growth, signalling regulated by the PD-1/PD-L1 pathway is also associated with substantial inflammatory effects (6).As such, sustained fever associated with anti PD-1 mAb monotherapy appears to be a poor prognostic factor for patients (7).Further, inflammation can promote resistance to ICI in some cancer models (8).In turn, antipyretic medication has a marked adverse effect on ICI efficacy (5).
In addition, many types of autoimmune diseases include a component of the inflammatory response, with direct evidence indicating that the heat locally generated in inflammed tissue is up to 2 ∘ C higher than the average physiological temperature (9).
It is important therefore to address what the roles of inflammation and fever-range temperatures (38 ∘ C-40 ∘ C, 311 K-313 K) are on the structure and the activity of monoclonal antibodies relevant for the therapies of various cancers.A growing body of data indicates that the binding affinity of mature antibodies is enhanced at 40 ∘ C as opposed to the data obtained at 37 ∘ C in vitro (10), while in silico results suggest either positive (11) or negative effects (12) of higher temperatures on the activity of various antibodies binding to cancer or autoimmune targets.
Among the multitude of biophysical and biochemical techniques that are used for studying protein interactions, MD simulations provide the highest temporal and spatial resolution, with the additional advantage of accounting for contextdependency (e.g.temperature) of these processes.However, their key drawback is the computational cost, which generally prohibits large-scale and/or systematic analyses.It is important to note however, that there have already been a few initiatives to produce and analyse MD simulations of proteins in their native state, with an aim to describe their global flexibility that may be relevant for their function (13)(14)(15)(16)(17).While this area of research remains active, data obtained by independent groups can make comparative analyses difficult, due to different software packages and/or force fields used during the simulations.To the best of our knowledge, only three public databases of MD simulations provide general datasets for soluble proteins: MoDEL (13), Dynameomics (14) and ATLAS (16).Other relevant databases exist but focus on coarse-grained simulations of proteins, or on particular protein classes, such as those from SARS-CoV-2 (18).However, while ATLAS is available, MoDEL is no longer updated, and Dynameomics is currently inaccessible.Consequently, there is an unmet need to ensure that permanent access to each of such MD simulations is in place for users, in the form of unique digital object identifiers (DOIs).
Furthermore, despite the potential to use these data in, e.g. biotechnology or drug design, MD simulations of multiple proteins have been traditionally too expensive to run for long timescales (>100 ns), and prohibitively expensive when these proteins need to be run at multiple temperatures, for much longer simulation times and from different initial random seeds.Recently, two modalities have emerged that may be used to overcome the issue of cost: first, the advent of cloud-based supercomputing has greatly enhanced the resources and data sharing available to researchers (19) on one hand, while on the other hand, computational approaches relying on machine learning are moving forward the limitations of classical MD simulations into time scales that exceed the outputs obtained with coarse-grained dynamics (20,21).
Here, we introduce the ThermoPCD database of MD trajectories for 50 protein complexes formed by therapeutic antibodies or autoantibodies against targets relevant for cancers and autoimmune diseases, respectively, at different pertinent temperatures.In addition, some MD trajectories describe only the antibodies or the antigens under same thermal conditions as the corresponding immune complexes.In ThermoPCD, each immune complex or antigen/antibody has been run, at a minimum, at three temperature points (310 K-312 K and at 313 K for most simulations) to 100 ns-long trajectories and up to 500 ns trajectories, in single to multiple independent runs.To each trajectory of a particular protein or protein complex at any particular temperature, we have assigned a unique and permanent DOI in the free-to-use Harvard Dataverse (https:// dataverse.harvard.edu).ThermoPCD incorporates a tool for searches by antigen, antibody and/or PDB codes, in order to facilitate usability.

Database construction: structural data sources
Target antigens pertinent to cancers and autoimmune diseases were drawn in part from previous publications (22,23).We have further filtered the results to focus on pathological conditions where fever was documented to have a clinical role, and where crystal structures were available with monoclonal antibodies (mAbs).Immune complexes were identified and extracted from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB) database, with the majority of the structures having a cutoff in the X-ray resolution of ≤3 Å.Most of the selected experimental protein structures are monomers, while few are functional as multimers.All antigens are human, unless otherwise specified.
An overview of the structures currently present in Ther-moPCD is shown below in Table 1.
Most simulations not only comprise antibody:antigen complexes, but we also performed simulations on full-length antibodies alone (pembrolizumab, PDB: 5DK3) or on antigens with or without an antibody bound (PDB: 5MV4, 5OCX, 6BFS, 6SF6).Because of its clinical importance, we also produced structures of PD-1:PD-L1 complexes (PDB 4ZQK).Due to the relative ease of crystallization of antibody Fab fragments, and therefore their preponderance in the RCSB database, antigens in complexes with Fabs are predominant in our datasets.Importantly, to date, there are very few structures of intact, full-length IgG1 or IgG2 antibodies ( 24), but we made use of a full-length IgG4 structure (pembrolizumab) for comparisons with other immune complexes that contain PD-1 (PDB: 7E9B, 6J14, 7WSL, 5GRJ in our data).
ThermoPCD includes not only antigen targets, mainly cytokines and chemokines, but also PD-1, PD-L1 and CTLA-4 relevant for immunotherapy, in complexes with some of the most commonly used blockbuster ICI.Data also include free binding energy calculations for 6/45 existing entries.

Preparation of protein structures
A detailed experimental protocol is present in the Supplementary Information 1.Briefly, unless otherwise noted, water and ligand molecules were removed from immune complexes to ensure protocol uniformity.When missing residues were found, MODELLER v10.1 was used to reconstruct the structures as loop regions (25).All-atom MD simulations were performed GROningen MAchine for Chemical Simulations (GROMACS2020) (http://www.gromacs.org/)using CHARMM27 (CHARM22 plus CMAP for proteins) forcefield periodic boundary conditions.The structures were solvated in a truncated octahedron box of simple point charge water model.The solvated systems were neutralized with Na + or Cl − counter ions using the tleap program.Particle Mesh Ewald was employed to calculate the long-range electrostatic interactions.The cut-off distance for the long-range van der Waals energy term was 12.0 Å.The systems were then minimized at a maximum force of 1000.0KJ/mol/nm by using 50 000 steps.The solvated and energy minimized systems were further equilibrated for 100 ps under NVT and NPT ensemble processes.In this work, we have deployed two thermostats: an initial Berendsen thermostat, due to its capability to equilibrate rapidly large protein complexes at the temperatures of interest, followed by the use of the Vrescale (Bussi-Donadio-Parrinello) thermostat for the MD production runs.A diagram of the workflow for producing the MD trajectories is presented in Figure 1.

Database and web interface
All data are stored in the free and open Harvard Dataverse Repository (HDR, https://dataverse.harvard.edu)with full and free public access for downloading the datasets.Permanent DOIs are available for each individual file, ensuring citability and ease of use.All data are open and findable via search engines, and permanently stored, avoiding the issue of data unavailability as is the case with similar databases (26,27).The experimental X-ray structure of each immune complex is taken as a reference and is also included on individual pages for all MD trajectories.
We have created the website ThermoPCD to HDR by using Google Sites.It provides a user-friendly interface where users can browse the whole database, avail of the hyperlinks to  HDR data, and where an overview of all entries is managed.Information is organized and shown in different sections.
Updates and additions will be entered on both the Ther-moPCD website and the corresponding HDR repository.
All entries in ThermoPCD contain basic information on the immune complex (resolution, CATH entry, etc.) as well as on individual antibody and/or antigen (residues participating in CDR, location of disulphide bonds in antigens, etc.), as presented in Figure 2.

Search for a protein
Within each category, data can be called from the drop-down menu according to the name of the antigen.Users can further search for proteins by using their acronyms, part or entirety of their full names or the PDB codes in both ThermoPCD and the linked HDR repository using the search tool.The user can search for proteins with different combinations of filters.A list with all entries is present for each category, as shown in Figure 3.

Protocols to enhance database use
To facilitate usability, we introduced protocols for interacting with data.For users interested in global parameters of interactions (e.g.buried surface areas between protein partners or changes in residues at the epitope:paratope interfaces) impacted by temperature increases, downloading the PDB files (a minimum of three structures for all data, starting with the initial structure(s) at 0 ns) allows for quick structural   comparisons using e.g.PDBePISA (Proteins, Interfaces, Structures and Assemblies) server at https://www.ebi.ac.uk/pdbe/ pisa.For users interested in determining other parameters of interactions between the immune partners, or of individual proteins, such as RMSD (root-mean-square deviation), RMSF (Root Mean Square Fluctuation), Rg (radius of gyration), etc., we added a protocol with commands, on how to obtain this data (Supplementary Information 2).
Furthermore, we included a supporting file (Supplementary Information 3) with a complete protocol on how to determine binding free energy data from our files stored in Harvard Dataverse, based on an example from our database (RBD: 6RP8, CTLA-4 in complex with ipilimumab Fab).The user can follow the commands and use the scripts therein to explore how the binding free energies vary or not with changes in the simulation temperatures, and relate these data with changes in other parameters of interactions that may be affected (RMSF, etc.).

Comparisons between structures
We illustrate the workflow for assessing possible conformational changes at different temperatures in an antibody of clinical interest (PDB 5DK3, full-length pembrolizumab).This blockbuster mAb is important for immunotherapy in various cancer models, and therefore is important to determine whether physical factors such as temperature may impact its activity.Pembrolizumab is a very compact molecule due to the presence of a short hinge region between each Fab domain and Fc (24).As such, Fab domains are about 2 nm closer to each other and also closer to Fc, than corresponding IgG1 structures (Figure 1a and b in reference 24).We asked whether, from a structural viewpoint, temperatures attained during fever or acute inflammation may induce conformational changes in this molecule.Based on a first set of simulations (https://sites.google.com/view/thermopcd/cancers/pembrolizumab), we made use of the structures obtained after 100 ns, to determine the extent of movements of the Fab domains with temperature.Using PyMol (https:// pymol.org/2),we prepared each pembrolizumab structure by highlighting the hinge regions (green and red, respectively, for each heavy chain).For comparison, we superimposed the structures obtained at 310 K and at 313 K with the original crystal structure, thus yielding RMSD of 0.36 nm and 1.52 nm, respectively.This increase in RMSD can be explained by visualizing Figure 4, whereby enhanced mobility of Fab domains is apparent from 310 K to 313 K.This behaviour at higher temperatures is similar to more independent domain motions and interactions that are known for IgG1 antibodies.Importantly, all the structures here used had the S228P mutation that is known to not only facilitate interchain disulphide-bond formation, but also to restrict Fab conformational changes (24), the latter of which appear to be relaxed on increasing temperature into a pathological range.Limiting or enabling these conformational changes in IgG4 antibodies may be important as these molecules have, on average, lower binding affinities, compared to IgG1 and IgG2 isoforms (24).

Figure 1 .
Figure 1.Protocol for data preparation and managing of the simulations.

Figure 2 .
Figure 2. Screenshot of the 4OD2 entry (Death Receptor 5 in complex with apomab), with hyperlinks to CATH, Uniprot and RCSB databases, as well as to Harvard Dataverse for accessing the raw data.

Figure 3 .
Figure 3. Screenshot of the index page with simulations for the structures relevant for autoimmune conditions.

Figure 5 .
Figure 5. Variations in binding free energies between PD-1 and PD-L2 under different temperatures.Values obtained every 10 ns throughout the 500 ns-long simulations, for each seed and temperature point.

Table 1 .
PDB codes, descriptors, number of random seeds and lengths of simulations used in this study (as of March 2024) a Data at 313 K are not available.b Free binding energy calculations are available.